Unsupervised Construction of a Multilingual WordNet from Parallel Corpora

نویسندگان

  • Dimitar Kazakov
  • Ahmad R. Shahid
چکیده

This paper outlines an approach to the unsupervised construction from unannotated parallel corpora of a lexical semantic resource akin to WordNet. The paper also describes how this resource can be used to add lexical semantic tags to the text corpus at hand. Finally, we discuss the possibility to add some of the predicates typical for WordNet to its automatically constructed multilingual version, and the ways in which the success of this approach can be measured.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Developing Parallel Sense-tagged Corpora with Wordnets

Semantically annotated corpora play an important role in natural language processing. This paper presents the results of a pilot study on building a sense-tagged parallel corpus, part of ongoing construction of aligned corpora for four languages (English, Chinese, Japanese, and Indonesian) in four domains (story, essay, news, and tourism) from the NTU-Multilingual Corpus. Each subcorpus is firs...

متن کامل

Spoken to Spoken vs. Spoken to Written: Corpus Approach to Exploring Interpreting and Subtitling

issue of Polibits includes a selection of papers related to the topic of processing of semantic information. Processing of semantic information involves usage of methods and technologies that help machines to understand the meaning of information. These methods automatically perform analysis, extraction, generation, interpretation, and annotation of information contained on the Web, corpus, nat...

متن کامل

Leveraging Parallel Corpora and Existing Wordnets for Automatic Construction of the Slovene Wordnet

The paper reports on a series of experiments conducted in order to test the feasibility of automatically generating synsets for Slovene wordnet. The resources used were the multilingual parallel corpus of George Orwell’s Nineteen Eighty-Four and wordnets for several languages. First, the corpus was word-aligned to obtain multilingual lexicons and then these lexicons were compared to the wordnet...

متن کامل

1381 - 2416 , Volume 12 , Combined 2 - 3

This article describes two different word sense disambiguation (WSD) systems, one applicable to parallel corpora and requiring aligned wordnets and the other one, knowledge poorer, albeit more relevant for real applications, relying on unsupervised learning methods and only monolingual data (text and wordnet). Comparing performances of word sense disambiguation systems is a very difficult evalu...

متن کامل

Retrieving Lexical Semantics from Multilingual Corpora

This paper presents a technique to build a lexical resource used for annotation of parallel corpora where the tags can be seen as multilingual ‘synsets’. The approach can be extended to add relationships between these synsets that are akin to WordNet relationships of synonymy and hypernymy. The paper also discusses how the success of this approach can be measured. The reported results are for E...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009